82

Algorithms for Binary Neural Networks

Omitting superscript ·t, we have the i-th component of A

w as

( A

w )i =

⎢⎢⎢⎢⎣

0

...

.

...

0

.

.

.

Ai,i

wi,1

...

Ai,i

wi,j

...

Ai,i

wi,J

.

.

.

0

...

.

...

0

⎥⎥⎥⎥⎦

,

(3.126)

we can derive

w ˆG(w, A) =

⎢⎢⎢⎢⎣

w1ˆg1

...

w1ˆgi

...

w1ˆgI

.

.

.

.

.

.

.

.

.

wIˆg1

...

wIˆgi

...

wIˆgI

⎥⎥⎥⎥⎦

.

(3.127)

Combining Eq. 3.126 and Eq. 3.127, we get

w ˆG(w, A)( A

w )i =

⎢⎢⎢⎢⎢⎢⎣

w1ˆgi

Ai,i

wi,1

...

.

...

w1ˆgi

Ai,i

wi,j

.

.

.

wiˆgi

Ai,i

wi,1

...

.

...

wiˆgi

Ai,i

wi,J

.

.

.

wIˆgi

Ai,i

wi,1

...

.

...

wIˆgi

Ai,i

wiJ

⎥⎥⎥⎥⎥⎥⎦

.

(3.128)

After that, the i-th component of the trace item in Eq. 6.72 is then calculated by:

Tr[w ˆG( A

w )i] = wiˆgi

J



j=1

Ai,i

wi,j

(3.129)

Combining Eq. 6.72 and Eq. 3.129, we can get

ˆwt+1 = wt+1 η2λ

⎢⎢⎢⎢⎢⎢⎣

ˆgt

1

J

j=1

At

1,1

wt

1,j

.

.

.

ˆgt

I

J

j=1

At

I,I

wt

I,j

⎥⎥⎥⎥⎥⎥⎦

⎢⎢⎢⎢⎣

wt

1

.

.

.

wt

I

⎥⎥⎥⎥⎦

= wt+1 + η2λdt wt,

(3.130)

where η2 is the learning rate of the real value weight filters wi,denotes the Hadamard

product. We take dt =gt

1

J

j=1

At

1,1

wt

1,j , · · · , ˆgt

I

J

j=1

At

i,i

wt

I,j ]T , which is unsolvable and un-

defined in the backpropagation of BNNs. To address this issue, we employ a recurrent model

to approximate dt and have

ˆwt+1 = wt+1 + U t DReLU(wt, At),

(3.131)

and

wt+1 ˆwt+1,

(3.132)

where we introduce a hidden layer with channel-wise learnable weights URCout

+

to recur-

rently backtrack the w. We present DReLU to supervise such an optimization process to

realize a controllable recurrent optimization. Channel-wise, we implement DReLU as

DReLU(wi, Ai) =



wi

if (¬D(w

i))D(Ai) = 1,

0

otherwise,

(3.133)